High-fidelity residential building occupancy detection data set | Scientific data

2021-11-16 07:49:30 By : Ms. Anne shi

Thank you for visiting Nature. The browser version you are using has limited support for CSS. For the best experience, we recommend that you use a newer version of the browser (or turn off the compatibility mode in Internet Explorer). At the same time, to ensure continued support, we will display sites without styles and JavaScript.

Scientific Data Volume 8, Article Number: 280 (2021) Cite this article

This article describes the development of a data acquisition system that is used to capture a series of occupancy-related patterns from a single-family house, as well as the generated data set. Publicly available data sets include: 32×32 pixel grayscale images captured every second; audio files processed to delete personally identifiable information; indoor environment readings, captured every ten seconds; and real-time binary occupancy status on the ground. The data collection system creates a mobile human presence detection (HPDmobile) system, which is deployed in six homes, each of which lasts at least one month, and captures all patterns simultaneously from at least four different locations in each home. The environmental mode is available at the time of capture, but in order to protect the privacy and identity of the occupants, the image is reduced and the audio file undergoes a series of processing steps, as described in this article. This data set adds a small part of the existing data and can be applied to energy efficiency and indoor environmental quality.

The machine that describes the report data can access the metadata file: https://doi.org/10.6084/m9.figshare.14920131

Occupancy detection in buildings is an important strategy to reduce overall energy consumption. In 2020, of the 98 PJ consumed by the U.S. 1 end-use sector (primary energy use plus electricity purchased from the power sector), residential energy consumption accounted for 22%, of which about 50% can be attributed to heating, ventilation, and air conditioning ( HVAC) use2. Studies using PIR sensors and smart thermostats have shown that residential energy use can be reduced by 15-47% by considering occupancy in HVAC operations3,4,5. Other studies have shown that by including occupancy information in model predictive control strategies, residential energy use can be reduced by 13-39%6,7. Although these reductions are not feasible in all climates, because humidity or icing risks may make HVAC equipment necessary to operate during unoccupied hours, a moderate temperature setback due to vacancy information can still save some energy. Other benefits of residential occupancy detection include enhanced living comfort, home safety, and home health applications8.

Historically, occupancy detection is mainly limited to passive infrared (PIR), ultrasonic or dual-technology sensing systems, but from extensive research related to new methods of occupancy detection, it is clear that there is a need to improve the capabilities of occupancy detection technologies, such as 8, 9 Review and summary. Newer methods include camera technology 10 with computer vision, sensor fusion technology 11, occupant tracking method 12, and occupancy models 13, 14. Many of these strategies are based on machine learning techniques15 and usually require large amounts of labeled training data.

At present, the author only knows that the research community can use three publicly available data sets to develop and test the effectiveness of residential occupancy detection algorithms: UCI16, ECO17, and ecobee donation data (DYD) data set18. The UCI data set captures temperature, relative humidity, light level, and CO2 as features recorded at one-minute intervals. The ECO data set captures power consumption at one-second intervals. DYD data is collected from ecobee thermostats and includes environmental and system measurements such as: operating time of heating and cooling sources, indoor and outdoor relative humidity and temperature readings, detected motion, and thermostat schedule and set points. Although all these data sets are useful to the community, none of them contain ground truth occupancy information, which is crucial for the development of accurate occupancy detection algorithms. In addition, other indoor sensing modes not captured by these datasets are also desirable. For example, both images and audio can provide strong signs of human presence. Although there are many data sets that can be used to use object (person) detection, person recognition, and people counting in commercial spaces19, 20, 21, the author knows that there are no publicly available data sets that can capture these patterns in residential spaces. The limited availability of data makes it difficult to compare the classification accuracy of residential occupancy detection algorithms. However, it is understandable why there is no data set containing images and audio, because privacy issues make it difficult to capture and publish these types of data22. Therefore, a data set containing privacy-protected audio and images from homes is a new contribution and provides additional data sets for the architectural research community to train, test, and compare occupancy detection algorithms. The inherent difficulty of obtaining these sensitive data makes the dataset unique and increases the sparse body of the existing residential occupancy dataset.

The data collection described in this article was used for research projects funded by Advanced Research Projects Agency-Energy (ARPA-E). The project is part of the National Energy-Saving Structure Occupancy Recognition (SENSOR) program launched in 2017, which aims to "develop a user-transparent sensor system that can accurately quantify the presence of humans to significantly reduce energy use in commercial and residential buildings"23. Our team is particularly focused on residential buildings, and we are using the captured data to inform the development of machine learning algorithms and new RFID-based wireless and battery-less hardware for occupancy detection. The data we collect is based on the UCI data set, by capturing the same environmental patterns, while also capturing privacy-protected images and audio. This data descriptor describes the system used to capture the information, the processing technology used to protect the privacy of the occupants, and the final open source data set available to the public. For a summary of captured and available modes, see Table 1.

Time series data related to occupancy rates was captured from six different residences in Boulder, Colorado, within a year. These houses include individuals and couples in single studio apartments, one- and two-bedroom apartments, and families and roommates in three-bedroom apartments and single-family houses. The houses and apartments tested are standard buildings, representing the building stock of the area, built between the 1960s and the early 2000s. Boulder has a mild climate, with an average annual rainfall of 54 cm, with rainfall in summer and snow in winter. According to a report from the National Oceanic and Atmospheric Administration (NOAA) (https://psl.noaa.gov/boulder), the average minimum and maximum temperatures in the region are -6 °C and 31 °C.

The initially captured modes are: a monochrome image with a resolution of 336 × 336 pixels; a 10-second 18-bit audio file recorded at a sampling frequency of 8 kHz; an indoor temperature reading in °C; an indoor relative in% Humidity (rH) readings; indoor carbon dioxide equivalent (eCO2) readings in parts per million (ppm); indoor total volatile organic compound (TVOC) readings in parts per billion (ppb); and illuminance (Lux) light level. Images are captured at a rate of 1 frame per second, and all environmental readings are captured every 10 seconds. The 10-second sampling frequency of environmental sensors is higher than the frequency required to capture dynamics such as temperature changes, but this high frequency is chosen to allow researchers the flexibility to choose their own down-sampling method, and may capture occupancy-related issues such as turning on lights Such events. Audio files are continuously captured, generating 8,640 audio files every day. Also collected and included in the data set is the actual ground occupancy information, which includes the binary (occupied/unoccupied) status and the estimated number of occupants in the house at a given time. The reported binary status has been verified, while the total has not been verified and should only be used as an estimate. In addition, an area label is provided for the images, and a binary flag is used to indicate whether each image shows a person. These labels are automatically generated using a pre-trained detection model. Due to the huge amount of data, the images have not been fully verified. Instead, they have been spot-checked and provided indicators of the accuracy of these labels.

Readers may be curious about the sensor fusion algorithm created using the data collected by the HPDmobile system. In the context of this project, it takes great effort to develop suitable sensor fusion technology. The final algorithm uses isolation forests, convolutional neural networks, and spatiotemporal pattern networks to infer occupancy based on individual patterns. Although a single sensor can provide instant information to support occupancy, the absence of a sensor trigger at a certain point in time does not necessarily indicate that the home state is not occupied, so a fusion framework is needed. Our best fusion algorithm is an algorithm that considers concurrent sensor readings and time-lag occupancy prediction. The prediction that includes time lag is to consider the memory during the check-in process to avoid very problematic false negative predictions, which mainly occur at night when people sleep or read. The popular method is the hierarchical method. According to the autoregressive logistic regression process, instantaneous occupancy inference is the basis for higher-level inference. The sensor fusion design we developed is one of many possible designs. The purpose of publishing this data set is to encourage other researchers to adopt different designs.

The basis of the project is to capture (1) audio signals capable of recognizing human speech (ranging from 100 Hz to 4 kHz) and (2) monochrome images with at least 10,000 pixels. The other key requirements of the system are: (3) the ability to collect data from multiple locations in the house at the same time, (4) low price, and (5) operation independently of the residential WiFi network. Initially considered commercial data acquisition systems, such as National Instruments CompactRio (CRIO), but the cost of these systems was too high, especially when considering adding modules required for wireless communication, so we chose to design our own system. Given the recently introduced systems, such as the Delta Controls O3 sensor hub 24, a custom-designed data acquisition system may not be necessary today. Two independent systems have been established so that data can be captured from two households at the same time. The final system, each of which is called a mobile human presence detection system or HPDmobile, is based on a Raspberry Pi single-board computer (referred to as SBC in the rest of this article), which acts as a sensor hub and uses inexpensive sensors and components. For hobby electronics.

Each HPDmobile data acquisition system includes:

Five (5) sensor hubs, each hub containing environmental sensors, microphones, and cameras

An industrial computer used as a field server

Wireless router for connecting field components

The sensor hub runs a Linux-based operating system and is used to collect and temporarily store individual sensor readings. The sensor is connected to the SBC through a custom-designed printed circuit board (PCB), and the SBC provides 3.3 Vdc power for all sensors. Each sensor hub is connected to the field server via a wireless router, all of which are located in the monitored home. Due to the limited storage capacity of the SBC, an on-site server is required. The server runs a separate Linux-based virtual machine (VM) for each sensor hub. The data captured on the sensor hub is periodically transmitted wirelessly to the attached VM and stored in the home during the test. For a diagram of hardware and network connections, see Figure 1a. SBCs are connected to batteries plugged into the wall and are used as uninterruptible power supplies, providing temporary power during short power outages (they have a 7-hour capacity). The battery also helps with the setup of the system, because the location of the sensor hub can be determined by monitoring the camera output before connecting the power cord. For images of a complete sensor hub and a complete circuit board with sensors, see Figure 1b, c. The cost of creating and running each system is ultimately about US$3,600, of which the cost of each hub is about US$200, the total cost of routers and servers is US$2,300, and the monthly service cost of each router is US$25 per month. The research, design and testing of the system lasted six months, and the data collection of both systems lasted one year.

(a) The system architecture, hardware components and network connections of the HPDmobile data acquisition system. (b) The final sensor hub installed in the home (connected to an external battery). (c) Custom-designed printed circuit boards with sensors. The clockwise sensors from the upper right corner are: camera, microphone, light, temperature/humidity, gas (CO2 and TVOC) and distance.

To ensure accuracy, the ground truth occupancy rate is collected in two ways. First, geofencing was deployed for all test houses. This is done through the "if-this-then-that" (IFTTT) software application installed on the user's mobile phone. During the test in their home, each resident must carry a mobile phone with GPS positioning when leaving the house. When they enter or leave the perimeter of the house, the IFTTT application triggers and registers the event type (exit or entry), user, and timestamp when the event occurred. In addition to digital records, each family also has a paper copy, and residents need to sign in and sign out when entering and leaving the house. At the end of the collection period, two methods (paper and digital) check-in logs were reviewed, and any discrepancies or problematic entries were checked or checked with the residents. Due to the difficulty of using mobile phones, some residents end up relying entirely on paper systems. The check-in logs of all residents and guests are merged to generate a binary check-in/no check-in status of the entire house.

Since the data collection involved human subjects, all steps of the process were approved by the Federal Institutional Review Board (IRB). This means that the Human Subject Research (HSR) program is in place before any data collection begins, and it ensures that strict protocols are followed in data collection and use. The subsequent review meeting confirmed that HSR has been implemented as required. After the processing method was finalized, additional IRB approval was sought and granted to publicly release the data set. As part of the IRB approval process, all subjects agreed to collect and distribute data after applying privacy protection methods.

The test subjects were recruited from graduate students in the Department of Construction Engineering of the Testing University and cutting-edge faculty and staff in Colorado. Choose test houses to represent various living arrangements and living styles. The houses tested included independent single-family houses and apartments in large and small complexes. Residents cover all ages and relationships, including couples, roommate families, and a family with adult children who are at home during part of the test. Due to IRB restrictions, families with children under 18 are not included. Three of the six families have pets-indoor and outdoor cats and a dog. For a summary of the selected houses, see Table 2.

The number of sensor hubs deployed in the home varies from four to six, depending on the size of the living space. Hubs are only placed in common areas, such as the living room and kitchen. Considering the privacy of the occupants, the hub is not placed in or near the bathroom or bedroom. Through dialogue with residents on typical residential usage patterns, the ideal hub location was determined. The goal is to cover all entry and exit points, as well as all "hanging out" areas. The hub is placed next to or facing the front door as well as the living room, dining room, family room and kitchen. Several larger houses have multiple common areas. In this case, the sensors are more dispersed and there is almost no overlap between the observed areas. Smaller houses have more compact public spaces, so the covered areas have more overlap. See Figure 2 for a home layout marked with the location of the sensor hub.

Home layout and sensor placement. A blue outline hub with a blue arrow indicates that the hub is located above the doorway and slopes slightly downward. (a) H1: The main level of a three-story house. (b) H2: Complete apartment layout. (C) and (d) H3: The main and top floors of a three-story house (respectively). (e) H4: The main floor of a two-story apartment. (f) H5: Complete apartment layout. (g) H6: The main floor of a studio apartment with a loft bedroom.

All data was captured in 2019 and therefore does not reflect changes in occupancy rates due to the COVID-19 global pandemic. Although the data collection period is relatively normal, the housing occupancy rate is quite high (47% to 82% of the total occupancy time). This is most likely due to the relative homogeneity of the test subjects and the fact that many graduate students with atypical schedules, at least one of which is completely working from home. Two houses with only one occupant have the lowest occupancy rates because there are no overlapping schedules in these cases. The high occupancy rate of pet houses may be because pet owners need to stay at home more often, but this may be just a coincidence. We cannot ignore the fact that the behavior of occupants may be changed due to monitoring knowledge, but this knowledge seems unlikely to lead to an increase in the occupancy rate.

After collection, the data is processed in a variety of ways. First, a small amount of processing was performed to facilitate the deletion of data from the on-site server. Next, the process of verifying the data and checking the integrity is performed. Finally, to protect the privacy of research participants, the audio was anonymized and the images were reduced. This section describes all the processes performed on the data before the data is disclosed.

Although the data acquisition system was initially configured to collect images at 336 × 336 pixels, this is considered to be much larger than the resolution required by the ARPA-E project and much larger than the publicly released resolution. To help retrieve images from the on-site server and later storage, the images are reduced to 112 × 112 pixels, and the brightness of each image is calculated, which is defined by the average pixel value. Images with an average value less than 10 are considered "dark" and will not be transmitted from the server. Figure 3 compares four images from a hub and gives the average pixel value of each image. Based on this, it is obvious that images with an average pixel value of less than 10 are almost useless in inference tasks and can be safely ignored. In addition to H2, the timestamps of these dark images are recorded in text files and included in the final data set so that the dark images can be distinguished from the images lost due to system failure. Since the center collects images 24 hours a day, dark images account for a large part of the total collection. Ignoring these will significantly reduce the size of the data set. Calculate the ratio of dark images of all hubs in all households to the total images and the ratio of missing images every day. These are summarized in Table 3.

Four different images from the same sensor hub, compare the relative "brightness" of the images, as described by the average pixel value. The displayed image is 112 × 112 pixels. (a) Average pixel brightness: 106. (b) Average pixel brightness: 43. (c) Average pixel brightness: 32. (d) Average pixel brightness: 10.

In order to protect the privacy of residents and delete personally identifiable information (PII), the image is further reduced, from 112 × 112 pixels to 32 × 32 pixels, using a bilinear interpolation process. The working principle of this process is to fix the pixel value at the edge of the image, and then take the weighted average of the internal pixels to convert from the original size to the target size. Therefore, the new pixel value is generated from the linear combination of the original value. When converted to a size smaller than the original size, the result is an effective blurred image. Figure 4 shows four examples of the original image (original 336 × 336 pixel size) and the resulting reduced image (32 × 32 pixel size). Please note that these images are from one of the researchers and her partners, and they all agreed to use their portraits in this data descriptor. This process is irreversible, so the original details on the image cannot be restored. However, we believe that the reduced image still has important value. In order to increase the usefulness of the image, a region-based label is provided for the image. The technical verification section describes the methods for generating and checking these labels.

(ad) Originally captured 336 × 336 pixel image. (eh) The same image, reduced to 32 × 32 pixels. Please note that these images are from one of the researchers and her partners, and they all agreed to use their portraits in this data descriptor.

Audio files are processed in a multi-step manner to remove intelligible speech. In each 10-second audio file, the signal is firstly mean shifted, and then full-wave rectified. Finally, the signal is downsampled by 100 times, and the resulting audio signal is stored as a CSV file. See Figure 5 for a visualization of the audio processing steps performed. This series of processing allows us to capture features from the original audio signal, while hiding the speaker’s identity and ensuring that nothing spoken is unrecognizable.

Audio processing steps performed on two audio files. In "noise", people's movement in space is identifiable, while in "quiet", there is no audible sound. (a) Original waveform sampled at 8 kHz. (b) Waveform after applying mean shift. (c) The waveform after full-wave rectification. (d) The waveform after sampling with an integer factor of 100.

Since environmental readings are not considered an invasion of privacy, there is no need to process them to delete PII. The minimal processing of environmental data is just to merge the readings originally captured in the minute-level JSON file and establish a uniform sampling rate, because occasional errors in the data writing process cause the time stamp to not always be accurate to 10 -seconds Increment. The timestamp is simply rounded to the nearest 10-second increment, and any duplicates produced by this process are deleted.

The published data set is hosted on figshare25. The data for each household includes audio, image, environmental mode and ground-based live occupancy information, as well as a list of dark images that are not included in the data set. Most data records are provided in compressed files organized by family and modal. Due to size limitations, images are organized as one hub per compressed file, while other methods include all hubs in one compressed file. Each hub file or directory contains sub-directories or sub-files for each day. See the folder structure diagram in Figure 6, which contains example folders and files.

An example of data records available for a household. The file type shows the top-level compressed file associated with this mode, while the example subfolder or file name highlights a possible path to the basic level data record in that folder. The structure gives the tree structure of the subdirectories, and the last entry in each section describes the data record type.

Audio and image files are stored in other subfolders organized by minute, with a maximum of 1,440 minute folders in the daily directory. Each audio minute folder contains up to six CSV files, each file represents a processed 10-second audio clip from a hub, and each image minute folder contains up to 60 images in PNG format. The list of dark images is stored in a CSV file, organized by hub and date. Each daily CSV file contains a list of all timestamps where the average brightness of the day is less than 10, so it is not included in the final data set. The regional labels of the images are provided as CSV files, one file per day for each center.

Environmental data is stored in a CSV file, and each individual hub in the CSV has a day's readings. In addition to the environmental readings shown in Table 1, baseline measurements of TVOC and eCO2 collected by the sensor are also included in the file. The real situation of each family is stored in the daily CSV file, listed for the (verified) binary occupancy status, where 1 means the house is occupied, 0 means it is vacant, and the total number of occupants that have not been verified (The estimated number of people was at home at that time).

In the process of integrating environmental readings, placeholder timestamps are generated for missing readings, so each daily CSV contains accurate 8,640 rows of data (plus header rows), although some entries are empty. The missing data is represented as blank, unfilled cells in the CSV. Images or audio files that were not captured due to a system failure have no placeholders in the data set, so the total number of subfolders and files varies from day to day. See Table 3 for the average number of files captured by each hub.

Because the data can be acquired with one of two different systems (HPDred or HPDblack), the sensor hub is represented by the color of the field server (red or black). For example, the first hub in the red system is called RS1, and the fifth hub in the black system is called BS5. These names have not changed during the entire data collection process, so RS3 in home H1 and RS3 in home H5 are the same physical hardware. The systems used in each household are related to the systems available at the time, and most of the data presented is ultimately collected by HPDred. The timestamp format is consistent in all data types and is given in the format of YY-MM-DD HH:MM:SS, and the time is 24 hours. Therefore, the file named 2019-11-09_151604_RS1_H1.png represents the image of sensor hub 1 (RS1) in H1, which was taken at 3:16:04 PM on November 9, 2019. File, so the file named 2019-10-18_002910_BS5_H5.csv was captured in H6 of Hub 5 (BS5) from 12:29:10 AM to 12:29:19 AM on October 18, 2019.

Due to the presence of PII in the original high-resolution data (audio and images), plus the fact that these data were acquired from private houses for a long time, it is impossible to publish these patterns in their original form. However, we believe that the processing techniques applied to these modes retain the distinctive features of human existence. For experimental results comparing the inference values ​​of original and processed audio and images, see Technical Verification.

In order to confirm that human presence markers can still be detected in the processed audio data, we train and test the audio classifier on a pre-labeled subset of the collected audio data, unprocessed WAV files (called P0 files) And CSV starts the file (called P1 file) that has completed the processing steps described in the data processing.

If some human sounds can be heard (such as talking, moving, or cooking sounds), the original audio file is manually marked as "noisy" or "quiet", if no human activity is heard. The training set and test set are created by aggregating data from all centers in the home to create larger and more diverse sets. The final distribution of noisy files and quiet files in each set is approximately equal, and a test set is randomly selected from the shuffled data using a 70/30 training/test split. Depending on the data type (P0 or P1), different post-processing steps are performed to standardize the data format. The classification is done using the k-nearest neighbor (k-NN) algorithm. For the classification performance of the two file types, see Table 4. The results show that although the predictive power of the processed data is slightly lower than that of the original data, a simple model can still detect the presence of humans in most cases. The only exception is the data collected in H6, which has significantly lower test accuracy on P1 data.

In order to make the reduced image most useful, we created an area-based image tag that specifies whether someone is visible in the frame of each image in the published data set. Use the pre-trained object detection algorithm You Only Look Once-version 5 (YOLOv5)26 to classify 112 × 112 pixel images as occupied or unoccupied. The YOLO algorithm uses a Convolutional Neural Network (CNN) to generate the probability of a person in an image. The optimal cut-off threshold for classifying images as occupied or vacant is found through cross-validation and is unique for each hub. The median cutoff value is 0.3, but it ranges from 0.2 to 0.6. Images with a probability higher than the cut-off value are marked as occupied, while all other images are marked as vacant. Then randomly selected two sets of images (persons and vacant), the researchers manually verify whether there are people in the images. From these verified samples, we generated estimates of the following points: The probability of correctly identifying the image that is actually occupied (sensitivity or true positive rate); the probability of correctly identifying a truly empty image (specificity or true negative rate); being labeled It is the probability that the occupied image is actually occupied (positive predictive value or PPV); and the probability that the image marked as empty is actually empty (negative predictive value or NPV). These reports are in Table 5, along with the number of actual occupancy and actual vacant images sampled, and the cutoff threshold used for each hub. Two sets of images (marked as occupied by the YOLO algorithm and those marked as empty) are randomly sampled to try to obtain the same number of each type. Due to the misclassification of the algorithm, the actual occupied and vacant images of each hub are different. Since the subset of labeled images is randomly sampled, there are various lighting scenarios. However, all images in the labeled subset have pixel values ​​below the 10 threshold.

Through sampling and manual verification, some misclassification patterns were discovered. For example, false alarms (the algorithm predicts that a person is in the frame when there is no one) seem to occur more often on cameras with large windows of field of view, where the lighting conditions have changed drastically. In some cases, this can result in a higher occupancy threshold being selected during the cross-validation process, resulting in lower specificity and lower PPV. An example is shown in Figure 7c, where the algorithm marks empty images as "occupied" at the cutoff threshold specified in Table 5. In other cases, it is found that false negatives occur more often when there is a long field of view in the camera, and people stay away from the camera for a period of time. These examples are given in Figures 7a and b, marked as empty at the threshold used.

Five images misclassified by the YOLOv5 labeling algorithm. (a) and (b) are examples of false negatives, where the image is marked as empty at the threshold used (0.3 and 0.4, respectively). (c), (d), and (e) are examples of false alarms, where the image is marked as occupied threshold (0.5, 0.3, and 0.6, respectively). For each image, the highest likelihood area of ​​a person (predicted by the algorithm) is shown in red, and the probability that the area contains a person, family, and sensor hub is given below each image. Both (d) and (e) emphasize that the cat is the most likely character position, which rarely happens.

The YOLOv5 marking algorithm has proven to be very robust to rejecting pets. This may be because the version of the algorithm used is pre-trained on the public object (or COCO) dataset 24 in the context, which includes more than 10,000 instances of dogs and cats. Figure 8 shows two examples of correctly labeled images containing cats. Some households have higher instances of false positives involving pets (see Figure 7d, e), but in most cases, the algorithm is good at distinguishing people from pets.

In general, the tagging algorithm performs well in distinguishing people from pets.

As expected, the image resolution has a significant impact on the detection accuracy of the algorithm, the higher the resolution, the higher the accuracy. In order to show the results of resolution versus accuracy, we run the YOLOv5 algorithm on balanced, labeled data sets of various sizes (32 × 32 pixels to 128 × 128 pixels), and compare the accuracy (defined as the total number of correct identifications) divided by the classification Total) across families. The results are shown in Figure 9. In order to generate different image sizes, the 112 × 112 image is either reduced in size using bilinear interpolation or enlarged in size by filling the white border to generate the required image size. Since higher resolution does have better performance, ground truth marking was performed on a larger size (112 × 112) instead of the 32 × 32 size published in the database.

The effect of image resolution on the prediction accuracy of the YOLOv5 algorithm. The dots show the average prediction accuracy of the algorithm for a set of roughly balanced labeled images from each household, and the error bars give the standard deviation of all observations in the household.

The sensors used were chosen because of their ease of integration with the Raspberry Pi sensor hub. Most sensors use the I2C communication protocol, which allows the hub to sample from multiple sensor hubs at the same time. When the system was developed, all of these were cheap and available to the public. Refer to Table 6 for details of sensor models. In most cases, the sensor accuracy is traded at the cost of system cost and ease of deployment, which leads to reduced reliability of environmental measurements. However, the trends in the data are still obvious, and changes in family status can be easily detected. See Figure 10 for a sample of 24-hour environmental data and occupancy. The sensor is tested in the laboratory before being installed in the first home to ensure stable and consistent readings. However, no formal calibration of the sensor was performed.

Time series environmental readings on a certain day (November 3, 2019) in H6, and check-in status.

The reliability of the environmental data collection rate (system performance) is quite good, and the capture rate of most modes is higher than 95%. The temperature and humidity sensor has more drop points than other environmental modes, and the capture rate of this sensor is around 90%. Due to the slow rate of change in temperature and humidity due to human presence, researchers can accurately insert missing data points if needed.

TVOC and CO2 sensors use metal oxide gas sensors and have on-board calibration, which are executed at startup and periodically, and report eCO2 and TVOC based on a known baseline (also recorded by the system). The temperature and humidity sensor is a digital sensor based on a capacitive humidity sensor and a thermistor. The sensor is calibrated before shipment, and the reading is reported by the sensor according to the calibration coefficient stored in the onboard memory. The illuminance sensor uses broadband photodiodes and infrared photodiodes, and converts analog signals into digital signals on the board to approximate the response of the human eye to light levels.

It is well known that carbon dioxide sensors are not reliable27. Although the increase in readings may be related to the presence of people in the room, the recorded CO2 value may be higher than the actual value. The spatial overlap within the coverage area (i.e. a room where multiple sensor hubs are installed) can be used for verification of temperature, humidity, CO2, and TVOC readings.

In addition to the environmental sensors mentioned above, the sensor hub also includes a distance sensor using time-of-flight technology. The sensor should report the distance to the nearest object up to 4 m. However, the actual range it can report is affected by the internal mode selection and is severely affected by the ambient light level. For example, in the long sensing mode, the sensor can report a distance of up to 360 cm in a dark environment, but can only report a distance of 73 cm under strong light28. Therefore, distance measurements are considered unreliable in the various settings monitored and are not included in the final data set.

Use the image detection algorithm developed by the team to verify the ground truth. After training the high-precision image classifier used in the ARPA-E SENSOR project, these algorithms are applied to the complete collection of images to generate a binary decision for each image, declaring whether the frame is occupied or idle. These predictions were compared with the collected ground truth data, and all cases of false alarms were identified. False alarms (ie, when the classifier believes that someone is in the image but the basic facts indicate that the house is vacant) may represent mislabeled points. The researchers marked and checked the images from these periods. If the time point is indeed incorrectly marked, the researchers try to find out the reason (usually the entry or exit records are staggered for a few minutes) and modify the basic facts. In fact, all homes have cameras facing the front door of the home, which makes it easier to correct these situations after they are discovered. False negatives are not verified in a similar way because false negatives in images (that is, someone is home but the camera does not see them) are very common, because the system runs 24 hours a day and people do not always have cameras installed in the room.

Each family will be tested for four consecutive weeks. Data collection is checked roughly once a day, through on-site visits or remote visits. Due to technical challenges, the test time of some households was extended to allow more uninterrupted data collection. Therefore, data collection for up to eight weeks was carried out in some households. A single sensor error and the complexity of the data collection process caused some data blocks to be lost. Select the final data that has been published to maximize the amount of data available in a continuous period of time. The data of H1, H2, and H5 are all a continuous one, while the data of H3, H4, and H6 are composed of two consecutive time periods. For many reasons, the audio sensor has the lowest capture rate. In one hub (BS2) of H6, audio was not captured at all, while in the other hub (RS2 in H5), audio and environment were not captured for most of the collection period. Overall, during the release time period, the audio collection rate was 87%, and the environmental reading collection rate was 89%. If you don't consider the two hubs lacking a model, the collection rates of these two centers are both above 90%. The images have high acquisition reliability, and the total image capture rate during the release period is 98%. Dark images (not included in the data set) account for 19-40% of captured images, depending on the household. See Table 3 for a summary of the reliability of the collection, broken down by model, hub, and household.

For transparency and repeatability, we are providing a small portion of the original audio and image data (3 days from one home) upon request. Interested researchers should contact the corresponding author to obtain this data.

All the code used to collect, process and verify data is written in Python and is available for download29 (https://github.com/mhsjacoby/HPDmobile). All image processing is done using Python Image Library Package (PIL) 30 Image Module Version 7.2.0. Audio processing is done using SciPy31 io module, version 1.5.0. The pandas package32, version 1.0.5 is widely used in environmental data processing. The code base developed for collecting data using the HPDmobile system uses a standard client-server model, where the sensor hub is the server and the VM is the client. Please note that the term "server" in this article refers to the SBC (Sensor Hub), not the onsite server running the VM mentioned above. All collection codes on the client and server are written in Python and can be run on Linux systems. The technical verification of audio and image is done using scikit-learn33 version 0.24.1 and YOLOv526 version 3.0 in Python.

U.S. Energy Information Administration. Monthly energy review. https://www.eia.gov/totalenergy/data/monthly/archive/00352104.pdf (2021).

U.S. Energy Information Administration. Residential Energy Consumption Survey (RECS). https://www.eia.gov/consumption/residential/data/2015/ (2015).

Lu, J. et al. Smart thermostat: Use occupancy sensors to save household energy. Proceedings of the 8th ACM Embedded Network Sensor System Conference 211–224 (2010).

Gao, G. and Whitehouse, K. Self-programming thermostat: Optimize the frustration schedule based on family occupancy patterns. BuildSys '09 67–72 (2009).

Soltanaghaei, E. and Whitehouse, K. Walksense: Use walkway sensing to classify family occupancy status. Proceedings of the 3rd ACM International Conference on Energy-saving Building Environmental Systems 167–176 (2016).

Turley, C., Jacoby, M., Pavlak, G. & Henze, G. Development and evaluation of residential-aware HVAC control for residential building energy efficiency and occupant comfort. Energy 13, 5396 (2020).

Wang, F. etc. Using video data and carbon dioxide concentration detection to predict the number of occupants to control the indoor environment. Energy and Construction 145, 155–162 (2017).

Sun, K., Zhao, Q. & Zou, J. Overview of building occupancy measurement systems. Energy and Construction 216, 109965 (2020).

Saha, H., Florita, AR, Henze, GP & Sarkar, S. Occupancy sensing in buildings: a review of data analysis methods. Energy and Construction 188-189, 278-285 (2019).

Seidel, R., Apitzsch, A. & Hirtz, G. Improved the detection of people in omnidirectional images with non-maximum suppression. https://arxiv.org/abs/1805.08503 (2018).

Hobson, BW, Lowcay, D., Gunay, HB, Ashouri, A. & Newsham, GR. Opportunistic occupancy count estimation using sensor fusion: a case study. Architecture and Environment 159, 106154 (2019).

Howard, B., Acha, S., Shah, N., and Polak, J. Use information and communication technology data sets to implicitly perceive building occupancy counts. Architecture and Environment 157, 297–308 (2019).

Dodier, RH, Henze, GP, Tiller, DK & Guo, X. Building occupancy detection through the sensor trust network. Energy and Building 38, 1033–1043 (2006).

Yang, J., Santamouris, M. & Lee, SE reviewed occupancy sensing systems and occupancy modeling methods used in institutional buildings. Energy and Building 121, 344–349 (2016).

Huchuk, B., Sanner, S. & O'Brien, W. used networked thermostat data to compare machine learning models for residential building occupancy predictions. Architecture and Environment 160, 106177 (2019).

Candanedo, LM & Feldheim, V. uses a statistical learning model to accurately detect the occupancy of office rooms based on light, temperature, humidity, and CO2 measurements. Energy and Building 112, 28–39 (2016).

Kleiminger, W., Beckel, C. and Santini, S. use electric meters for household monitoring. 2015 ACM International Pervasive Computing Joint Conference Proceedings 975-986 (2015).

Eco bee. DYD Researcher's Handbook. https://www.ecobee.com/wp-content/uploads/2017/01/DYD_Researcher-handbook_R7.pdf (2017).

del Blanco, CR, Carballeira, P., Jaureguizar, F. & García, N. Use spatial perception classifier grids to use omnidirectional cameras for robust indoor positioning. Signal processing: Image Communication 93, 116135 (2021).

Figueira, D., Taiana, M., Nambiar, A., Nascimento, J. & Bernardino, A. The hda data set for the research of fully automatic re-identification system. Computer Vision-ECCV 2014 Seminar 241–255 (2015).

Change Loy, C., Gong, S. & Tian, ​​T. from semi-supervised to transfer crowd counting. IEEE International Conference on Computer Vision (ICCV) (2013).

Caleb Sangogboye, F., Jia, R., Hong, T., Spanos, C. & Baun Kjærgaard, M. A privacy protection data release framework that enhances the practicality of cyber-physical systems. ACM Sensor Network Transaction 14 (2018).

ARPA-E. Sensor: Realize nationwide energy saving in a structure with occupancy recognition function. https://arpa-e.energy.gov/news-and-media/press-releases/arpa-e-announces-funding-opportunity-reduce-energy-use-buildings (2017).

Microsoft Corporation, Delta Controls, and ICONICS. Use Delta Controls O3 Sense, Azure IoT and ICONICS to measure occupancy rates. https://deltacontrols.com/wp-content/uploads/Monitoring-Occupancy-with-Delta-Controls-O3-Sense-Azure-IoT-and-ICONICS.pdf (2021).

Jacoby, M., Tan, SY, Henze, G. and Sarkar, S. HPDmobile: High-fidelity residential building occupancy detection data set. Figshare https://doi.org/10.6084/m9.figshare.c.5364449 (2021).

Jocher, G. etc. Ultralytics/yolov5: v4.0-nn.SiLU() activation, weight and deviation recording, PyTorch central integration. Zenodo https://doi.org/10.5281/zenodo.4418161 (2021).

Fisk, WJ, Faulkner, D. & Sullivan, DP accuracy of CO2 sensors. Indoor Air Quality Application 9 (2008).

STMicroelectronics. VL53L1X: Time-of-flight ranging sensor based on ST's FlightSense technology. https://www.st.com/resource/en/datasheet/vl53l1x.pdf (2018).

Jacoby, M., Tan, SY & Mosiman, C. mhsjacoby/HPDmobile: v1.0.1-alpha. Zenodo https://doi.org/10.5281/zenodo.4655276 (2021).

van Kemenade, H. et al. Python pillow/pillow: (8.3.1). Zenodo https://doi.org/10.5281/zenodo.5076624 (2021).

Virtanen, P. etc. SciPy 1.0: The basic algorithm of scientific computing in Python. Natural Methods 17, 261–272 (2020).

Panda development team. Panda Development/Panda: Panda. Zenodo https://doi.org/10.5281/zenodo.3509134 (2021).

Pedregosa, F. etc. Scikit-learn: Machine learning in Python. http://jmlr.org/papers/v12/pedregosa11a.html (2011).

The research proposed in this work was funded by Advanced Research Projects Agency-Energy (ARPA-E) with award number DE-AR0000938. The author would like to thank the following people: Cory Mosiman, who played an important role in setting up the data collection system; Hannah Blake and Christina Turley, for their help with the data collection procedure; Jasmine Garland, who helped develop the labeled data set for technical verification ; The occupants of six monitored houses, because let us invade their lives.

Department of Civil, Environmental, and Architectural Engineering, University of Colorado Boulder, Boulder, 80309-0428, USA

Margaret Jacoby and Gregor Henze

Department of Mechanical Engineering, Iowa State University, Ames, 50011, USA

Sin Yong Tan & Soumik Sarkar

National Renewable Energy Laboratory, Golden, 80401, USA

Institute of Renewable and Sustainable Energy, Boulder, 80309, USA

You can also search for this author in PubMed Google Scholar

You can also search for this author in PubMed Google Scholar

You can also search for this author in PubMed Google Scholar

You can also search for this author in PubMed Google Scholar

GH and SS conceived and supervised the experiment. MJ created a data collection system, performed all data collection tasks, processed and verified the collected data, and wrote the manuscript. SYT assisted in the development of processing technology and conducted some technical verifications. All authors reviewed the manuscript.

The author declares no competing interests.

The publisher states that Springer Nature remains neutral on the jurisdiction claims of published maps and agency affiliates.

Open Access This article has been licensed under the Creative Commons Attribution 4.0 International License Agreement, which permits use, sharing, adaptation, distribution and reproduction in any media or format, as long as you appropriately indicate the original author and source, and provide a link to the Creative Commons license , And indicate whether any changes have been made. The images or other third-party materials in this article are included in the article’s Creative Commons license, unless otherwise stated in the material’s credit line. If the article’s Creative Commons license does not include the material, and your intended use is not permitted by laws and regulations or exceeds the permitted use, you need to obtain permission directly from the copyright owner. To view a copy of this license, please visit http://creativecommons.org/licenses/by/4.0/.

The Creative Commons Public Domain Dedication Exemption http://creativecommons.org/publicdomain/zero/1.0/ applies to metadata files related to this article.

Jacoby, M., Tan, SY, Henze, G. etc. A high-fidelity residential building occupancy detection data set. Scientific Data 8, 280 (2021). https://doi.org/10.1038/s41597-021-01055-x

DOI: https://doi.org/10.1038/s41597-021-01055-x

Anyone you share the following link with can read this content:

Sorry, there is currently no shareable link in this article.

Provided by Springer Nature SharedIt content sharing program

Sci Data ISSN 2052-4463 (online)